Principle Of Maximum Entropy
   HOME

TheInfoList



OR:

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
, in the context of precisely stated prior data (such as a
proposition In logic and linguistics, a proposition is the meaning of a declarative sentence. In philosophy, " meaning" is understood to be a non-linguistic entity which is shared by all sentences with the same meaning. Equivalently, a proposition is the no ...
that expresses
testable information The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
). Another way of stating this: Take precisely stated prior data or testable information about a probability distribution function. Consider the set of all trial probability distributions that would encode the prior data. According to this principle, the distribution with maximal
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
is the best choice.


History

The principle was first expounded by
E. T. Jaynes Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis. He wrote extensively on statistical mechanics and on foundations of probability and statisti ...
in two papers in 1957 where he emphasized a natural correspondence between
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
and
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
. In particular, Jaynes offered a new and very general rationale why the Gibbsian method of statistical mechanics works. He argued that the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of statistical mechanics and the
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
are basically the same thing. Consequently,
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
should be seen just as a particular application of a general tool of logical
inference Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in ...
and information theory.


Overview

In most practical cases, the stated prior data or testable information is given by a set of conserved quantities (average values of some moment functions), associated with the probability distribution in question. This is the way the maximum entropy principle is most often used in
statistical thermodynamics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic b ...
. Another possibility is to prescribe some
symmetries Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...
of the probability distribution. The equivalence between conserved quantities and corresponding
symmetry group In group theory, the symmetry group of a geometric object is the group of all transformations under which the object is invariant, endowed with the group operation of composition. Such a transformation is an invertible mapping of the ambient ...
s implies a similar equivalence for these two ways of specifying the testable information in the maximum entropy method. The maximum entropy principle is also needed to guarantee the uniqueness and consistency of probability assignments obtained by different methods,
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
and
logical inference Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word ''wikt:infer, infer'' means to "carry forward". Inference is theoretically traditionally divided into deductive reasoning, deduction and in ...
in particular. The maximum entropy principle makes explicit our freedom in using different forms of prior data. As a special case, a uniform
prior probability In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
density (Laplace's
principle of indifference The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities. The principle of indifference states that in the absence of any relevant evidence, agents should distribute their cre ...
, sometimes called the principle of insufficient reason), may be adopted. Thus, the maximum entropy principle is not merely an alternative way to view the usual methods of inference of classical statistics, but represents a significant conceptual generalization of those methods. However these statements do not imply that thermodynamical systems need not be shown to be ergodic to justify treatment as a
statistical ensemble In physics, specifically statistical mechanics, an ensemble (also statistical ensemble) is an idealization consisting of a large number of virtual copies (sometimes infinitely many) of a system, considered all at once, each of which represents a ...
. In ordinary language, the principle of maximum entropy can be said to express a claim of epistemic modesty, or of maximum ignorance. The selected distribution is the one that makes the least claim to being informed beyond the stated prior data, that is to say the one that admits the most ignorance beyond the stated prior data.


Testable information

The principle of maximum entropy is useful explicitly only when applied to ''testable information''. Testable information is a statement about a probability distribution whose truth or falsity is well-defined. For example, the statements :the expectation of the variable x is 2.87 and :p_2 + p_3 > 0.6 (where p_2 and p_3 are probabilities of events) are statements of testable information. Given testable information, the maximum entropy procedure consists of seeking the probability distribution which maximizes
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
, subject to the constraints of the information. This constrained optimization problem is typically solved using the method of
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
s. Entropy maximization with no testable information respects the universal "constraint" that the sum of the probabilities is one. Under this constraint, the maximum entropy discrete probability distribution is the uniform distribution, :p_i=\frac\ \ i\in\.


Applications

The principle of maximum entropy is commonly applied in two ways to inferential problems:


Prior probabilities

The principle of maximum entropy is often used to obtain prior probability distributions for
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
. Jaynes was a strong advocate of this approach, claiming the maximum entropy distribution represented the least informative distribution. A large amount of literature is now dedicated to the elicitation of maximum entropy priors and links with channel coding.


Posterior probabilities

Maximum entropy is a sufficient updating rule for
radical probabilism Radical probabilism is a hypothesis in philosophy, in particular epistemology, and probability theory that holds that no facts are known for certain. That view holds profound implications for statistical inference. The philosophy is particularly ass ...
.
Richard Jeffrey Richard Carl Jeffrey (August 5, 1926 – November 9, 2002) was an American philosopher, logician, and probability theorist. He is best known for developing and championing the philosophy of radical probabilism and the associated heuristic of ...
's
probability kinematics Radical probabilism is a hypothesis in philosophy, in particular epistemology, and probability theory that holds that no facts are known for certain. That view holds profound implications for statistical inference. The philosophy is particularly ass ...
is a special case of maximum entropy inference. However, maximum entropy is not a generalisation of all such sufficient updating rules.


Maximum entropy models

Alternatively, the principle is often invoked for model specification: in this case the observed data itself is assumed to be the testable information. Such models are widely used in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
. An example of such a model is
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
, which corresponds to the maximum entropy classifier for independent observations.


Probability density estimation

One of the main applications of the maximum entropy principle is in discrete and continuous
density estimation In statistics, probability density estimation or simply density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought of ...
. Similar to
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
estimators, the maximum entropy principle may require the solution to a
quadratic programming Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize (minimize or maximize) a multivariate quadratic function subject to linear constr ...
problem, and thus provide a sparse mixture model as the optimal density estimator. One important advantage of the method is its ability to incorporate prior information in the density estimation.


General solution for the maximum entropy distribution with linear constraints


Discrete case

We have some testable information ''I'' about a quantity ''x'' taking values in . We assume this information has the form of ''m'' constraints on the expectations of the functions ''fk''; that is, we require our probability distribution to satisfy the moment inequality/equality constraints: :\sum_^n \Pr(x_i)f_k(x_i) \geq F_k \qquad k = 1, \ldots,m. where the F_k are observables. We also require the probability density to sum to one, which may be viewed as a primitive constraint on the identity function and an observable equal to 1 giving the constraint :\sum_^n \Pr(x_i) = 1. The probability distribution with maximum information entropy subject to these inequality/equality constraints is of the form: :\Pr(x_i) = \frac \exp\left lambda_1 f_1(x_i) + \cdots + \lambda_m f_m(x_i)\right for some \lambda_1,\ldots,\lambda_m. It is sometimes called the
Gibbs distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probabilit ...
. The normalization constant is determined by: : Z(\lambda_1,\ldots, \lambda_m) = \sum_^n \exp\left lambda_1 f_1(x_i) + \cdots + \lambda_m f_m(x_i)\right and is conventionally called the partition function. (The Pitman–Koopman theorem states that the necessary and sufficient condition for a sampling distribution to admit
sufficient statistics In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the p ...
of bounded dimension is that it have the general form of a maximum entropy distribution.) The λk parameters are Lagrange multipliers. In the case of equality constraints their values are determined from the solution of the nonlinear equations :F_k = \frac \log Z(\lambda_1,\ldots, \lambda_m). In the case of inequality constraints, the Lagrange multipliers are determined from the solution of a
convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization probl ...
program with linear constraints. In both cases, there is no
closed form solution In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th root ...
, and the computation of the Lagrange multipliers usually requires
numerical methods Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods th ...
.


Continuous case

For
continuous distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s, the Shannon entropy cannot be used, as it is only defined for discrete probability spaces. Instead
Edwin Jaynes Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis. He wrote extensively on statistical mechanics and on foundations of probability and statisti ...
(1963, 1968, 2003) gave the following formula, which is closely related to the
relative entropy Relative may refer to: General use *Kinship and family, the principle binding the most basic social units society. If two people are connected by circumstances of birth, they are said to be ''relatives'' Philosophy *Relativism, the concept that ...
(see also
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuo ...
). :H_c=-\int p(x)\log\frac\,dx where ''q''(''x''), which Jaynes called the "invariant measure", is proportional to the
limiting density of discrete points In information theory, the limiting density of discrete points is an adjustment to the formula of Claude Shannon for differential entropy. It was formulated by Edwin Thompson Jaynes to address defects in the initial definition of differential e ...
. For now, we shall assume that ''q'' is known; we will discuss it further after the solution equations are given. A closely related quantity, the relative entropy, is usually defined as the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
of ''p'' from ''q'' (although it is sometimes, confusingly, defined as the negative of this). The inference principle of minimizing this, due to Kullback, is known as the Principle of Minimum Discrimination Information. We have some testable information ''I'' about a quantity ''x'' which takes values in some interval of the
real numbers In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
(all integrals below are over this interval). We assume this information has the form of ''m'' constraints on the expectations of the functions ''fk'', i.e. we require our probability density function to satisfy the inequality (or purely equality) moment constraints: :\int p(x)f_k(x)\,dx \geq F_k \qquad k = 1, \dotsc,m. where the F_k are observables. We also require the probability density to integrate to one, which may be viewed as a primitive constraint on the identity function and an observable equal to 1 giving the constraint :\int p(x)\,dx = 1. The probability density function with maximum ''Hc'' subject to these constraints is: :p(x) = \frac q(x)\exp\left lambda_1 f_1(x) + \dotsb + \lambda_m f_m(x)\right/math> with the partition function determined by : Z(\lambda_1,\dotsc, \lambda_m) = \int q(x)\exp\left lambda_1 f_1(x) + \dotsb + \lambda_m f_m(x)\right,dx. As in the discrete case, in the case where all moment constraints are equalities, the values of the \lambda_k parameters are determined by the system of nonlinear equations: :F_k = \frac \log Z(\lambda_1,\dotsc, \lambda_m). In the case with inequality moment constraints the Lagrange multipliers are determined from the solution of a
convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization probl ...
program. The invariant measure function ''q''(''x'') can be best understood by supposing that ''x'' is known to take values only in the
bounded interval In mathematics, a (real) interval is a set of real numbers that contains all real numbers lying between any two numbers of the set. For example, the set of numbers satisfying is an interval which contains , , and all numbers in between. Other ...
(''a'', ''b''), and that no other information is given. Then the maximum entropy probability density function is : p(x) = A \cdot q(x), \qquad a < x < b where ''A'' is a normalization constant. The invariant measure function is actually the prior density function encoding 'lack of relevant information'. It cannot be determined by the principle of maximum entropy, and must be determined by some other logical method, such as the
principle of transformation groups The principle of transformation groups is a rule for assigning ''epistemic'' probabilities in a statistical inference problem. It was first suggested by Edwin T. Jaynes and can be seen as a generalisation of the principle of indifference. This ca ...
or marginalization theory.


Examples

For several examples of maximum entropy distributions, see the article on
maximum entropy probability distribution In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entro ...
s.


Justifications for the principle of maximum entropy

Proponents of the principle of maximum entropy justify its use in assigning probabilities in several ways, including the following two arguments. These arguments take the use of
Bayesian probability Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification ...
as given, and are thus subject to the same postulates.


Information entropy as a measure of 'uninformativeness'

Consider a discrete probability distribution among m mutually exclusive
proposition In logic and linguistics, a proposition is the meaning of a declarative sentence. In philosophy, " meaning" is understood to be a non-linguistic entity which is shared by all sentences with the same meaning. Equivalently, a proposition is the no ...
s. The most informative distribution would occur when one of the propositions was known to be true. In that case, the information entropy would be equal to zero. The least informative distribution would occur when there is no reason to favor any one of the propositions over the others. In that case, the only reasonable probability distribution would be uniform, and then the information entropy would be equal to its maximum possible value, \log m . The information entropy can therefore be seen as a numerical measure which describes how uninformative a particular probability distribution is, ranging from zero (completely informative) to \log m (completely uninformative). By choosing to use the distribution with the maximum entropy allowed by our information, the argument goes, we are choosing the most uninformative distribution possible. To choose a distribution with lower entropy would be to assume information we do not possess. Thus the maximum entropy distribution is the only reasonable distribution. Th
dependence of the solution
on the dominating measure represented by m(x) is however a source of criticisms of the approach since this dominating measure is in fact arbitrary.


The Wallis derivation

The following argument is the result of a suggestion made by Graham Wallis to E. T. Jaynes in 1962. It is essentially the same mathematical argument used for the
Maxwell–Boltzmann statistics In statistical mechanics, Maxwell–Boltzmann statistics describes the distribution of Classical physics, classical material particles over various energy states in thermal equilibrium. It is applicable when the temperature is high enough or the ...
in
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
, although the conceptual emphasis is quite different. It has the advantage of being strictly combinatorial in nature, making no reference to information entropy as a measure of 'uncertainty', 'uninformativeness', or any other imprecisely defined concept. The information entropy function is not assumed ''a priori'', but rather is found in the course of the argument; and the argument leads naturally to the procedure of maximizing the information entropy, rather than treating it in some other way. Suppose an individual wishes to make a probability assignment among m
mutually exclusive In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...
propositions. He has some testable information, but is not sure how to go about including this information in his probability assessment. He therefore conceives of the following random experiment. He will distribute N quanta of probability (each worth 1 / N ) at random among the m possibilities. (One might imagine that he will throw N balls into m buckets while blindfolded. In order to be as fair as possible, each throw is to be independent of any other, and every bucket is to be the same size.) Once the experiment is done, he will check if the probability assignment thus obtained is consistent with his information. (For this step to be successful, the information must be a constraint given by an open set in the space of probability measures). If it is inconsistent, he will reject it and try again. If it is consistent, his assessment will be :p_i = \frac where p_i is the probability of the ith proposition, while ''ni'' is the number of quanta that were assigned to the ith proposition (i.e. the number of balls that ended up in bucket i ). Now, in order to reduce the 'graininess' of the probability assignment, it will be necessary to use quite a large number of quanta of probability. Rather than actually carry out, and possibly have to repeat, the rather long random experiment, the protagonist decides to simply calculate and use the most probable result. The probability of any particular result is the
multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...
, :Pr(\mathbf) = W \cdot m^ where :W = \frac is sometimes known as the multiplicity of the outcome. The most probable result is the one which maximizes the multiplicity W . Rather than maximizing W directly, the protagonist could equivalently maximize any monotonic increasing function of W . He decides to maximize :\begin \frac 1 N \log W &= \frac 1 N \log \frac \\ pt&= \frac 1 N \log \frac \\ pt&= \frac 1 N \left( \log N! - \sum_^m \log ((Np_i)!) \right). \end At this point, in order to simplify the expression, the protagonist takes the limit as N\to\infty, i.e. as the probability levels go from grainy discrete values to smooth continuous values. Using
Stirling's approximation In mathematics, Stirling's approximation (or Stirling's formula) is an approximation for factorials. It is a good approximation, leading to accurate results even for small values of n. It is named after James Stirling, though a related but less p ...
, he finds : \begin \lim_\left(\frac\log W\right) &= \frac 1 N \left( N\log N - \sum_^m Np_i\log (Np_i) \right) \\ pt&= \log N - \sum_^m p_i\log (Np_i) \\ pt&= \log N - \log N \sum_^m p_i - \sum_^m p_i\log p_i \\ pt&= \left(1 - \sum_^m p_i \right)\log N - \sum_^m p_i\log p_i \\ pt&= - \sum_^m p_i\log p_i \\ pt&= H(\mathbf). \end All that remains for the protagonist to do is to maximize entropy under the constraints of his testable information. He has found that the maximum entropy distribution is the most probable of all "fair" random distributions, in the limit as the probability levels go from discrete to continuous.


Compatibility with Bayes' theorem

Giffin and Caticha (2007) state that Bayes' theorem and the principle of maximum entropy are completely compatible and can be seen as special cases of the "method of maximum relative entropy". They state that this method reproduces every aspect of orthodox Bayesian inference methods. In addition this new method opens the door to tackling problems that could not be addressed by either the maximal entropy principle or orthodox Bayesian methods individually. Moreover, recent contributions (Lazar 2003, and Schennach 2005) show that frequentist relative-entropy-based inference approaches (such as
empirical likelihood Empirical likelihood (EL) is a nonparametric method that requires fewer assumptions about the error distribution while retaining some of the merits in likelihood-based inference. The estimation method requires that the data are independent and ident ...
and exponentially tilted empirical likelihood – see e.g. Owen 2001 and Kitamura 2006) can be combined with prior information to perform Bayesian posterior analysis. Jaynes stated Bayes' theorem was a way to calculate a probability, while maximum entropy was a way to assign a prior probability distribution. It is however, possible in concept to solve for a posterior distribution directly from a stated prior distribution using the principle of minimum cross entropy (or the Principle of Maximum Entropy being a special case of using a uniform distribution as the given prior), independently of any Bayesian considerations by treating the problem formally as a constrained optimisation problem, the Entropy functional being the objective function. For the case of given average values as testable information (averaged over the sought after probability distribution), the sought after distribution is formally the Gibbs (or Boltzmann) distribution the parameters of which must be solved for in order to achieve minimum cross entropy and satisfy the given testable information.


Relevance to physics

The principle of maximum entropy bears a relation to a key assumption of
kinetic theory of gases Kinetic (Ancient Greek: κίνησις “kinesis”, movement or to move) may refer to: * Kinetic theory, describing a gas as particles in random motion * Kinetic energy, the energy of an object that it possesses due to its motion Art and enter ...
known as
molecular chaos In the kinetic theory of gases in physics, the molecular chaos hypothesis (also called ''Stosszahlansatz'' in the writings of Paul Ehrenfest) is the assumption that the velocities of colliding particles are uncorrelated, and independent of positi ...
or ''Stosszahlansatz''. This asserts that the distribution function characterizing particles entering a collision can be factorized. Though this statement can be understood as a strictly physical hypothesis, it can also be interpreted as a heuristic hypothesis regarding the most probable configuration of particles before colliding.


See also

*
Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
*
Dissipation In thermodynamics, dissipation is the result of an irreversible process that takes place in homogeneous thermodynamic systems. In a dissipative process, energy ( internal, bulk flow kinetic, or system potential) transforms from an initial form to ...
*
Info-metrics Info-metrics is an interdisciplinary approach to scientific modeling, inference and efficient information processing. It is the science of modeling, reasoning, and drawing inferences under conditions of noisy and limited information. From the po ...
* Maximum entropy classifier *
Maximum entropy probability distribution In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entro ...
*
Maximum entropy spectral estimation Maximum entropy spectral estimation is a method of spectral density estimation. The goal is to improve the spectral quality based on the principle of maximum entropy. The method is based on choosing the spectrum which corresponds to the most rando ...
*
Maximum entropy thermodynamics In physics, maximum entropy thermodynamics (colloquially, ''MaxEnt'' thermodynamics) views equilibrium thermodynamics and statistical mechanics as inference processes. More specifically, MaxEnt applies inference techniques rooted in Shannon inf ...
*
Principle of maximum caliber The principle of maximum caliber (MaxCal) or maximum path entropy principle, suggested by E. T. Jaynes, can be considered as a generalization of the principle of maximum entropy. It postulates that the most unbiased probability distribution of path ...
*
Thermodynamic equilibrium Thermodynamic equilibrium is an axiomatic concept of thermodynamics. It is an internal state of a single thermodynamic system, or a relation between several thermodynamic systems connected by more or less permeable or impermeable walls. In the ...
*
Molecular chaos In the kinetic theory of gases in physics, the molecular chaos hypothesis (also called ''Stosszahlansatz'' in the writings of Paul Ehrenfest) is the assumption that the velocities of colliding particles are uncorrelated, and independent of positi ...


Notes


References

* * * Giffin, A. and Caticha, A., 2007
''Updating Probabilities with Data and Moments''
* * * * Jaynes, E. T., 1986 (new version online 1996),
Monkeys, kangaroos and
, in ''Maximum-Entropy and Bayesian Methods in Applied Statistics'', J. H. Justice (ed.), Cambridge University Press, Cambridge, p. 26. * Kapur, J. N.; and Kesavan, H. K., 1992, ''Entropy Optimization Principles with Applications'', Boston: Academic Press. * Kitamura, Y., 2006
''Empirical Likelihood Methods in Econometrics: Theory and Practice''
Cowles Foundation Discussion Papers 1569, Cowles Foundation, Yale University. * * Owen, A. B., 2001, ''Empirical Likelihood'', Chapman and Hall/CRC. . * *


Further reading

* * Ratnaparkhi A. (1997
"A simple introduction to maximum entropy models for natural language processing"
Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania. An easy-to-read introduction to maximum entropy methods in the context of natural language processing. * Open access article containing pointers to various papers and software implementations of Maximum Entropy Model on the net. {{Authority control Entropy and information Bayesian statistics maximum entropy Probability assessment maximum entropy